272 PART 5 Looking for Relationships with Correlation and Regression
as time goes on, you may want to perform a regression analysis to see whether the
upward trend is statistically significant (meaning not due to natural random fluc-
tuations). If it is, you may want to create an estimate of the annual rate of increase,
including a standard error (SE) and confidence interval (CI).
Some analysts use ordinary least-squares regression as described in Chapter 16 on
such data, but event counts don’t really meet the least-squares assumptions, so
the approach is not technically correct. Event counts aren’t well-approximated as
continuous, normally-distributed data unless the counts are very large. Also, their
variability is neither constant nor proportional to the counts themselves. So
straight-line or multiple least-squares regression is not the best choice for event
count data.
Because independent random events like highway accidents should follow a Pois-
son distribution (see Chapter 24), they should be analyzed by a kind of regression
designed for Poisson outcomes. And — surprise, surprise — this type of special-
ized regression is called Poisson regression.
Introducing the generalized linear model
Most statistical software packages don’t offer a command or function explicitly
called Poisson regression. Instead, they offer a more general regression technique
called the generalized linear model (GLM).
Don’t confuse the generalized linear model with the very similarly named general
linear model. It’s unfortunate that these two names are almost identical, because
they describe two very different things. Now, the general linear model is usually
abbreviated LM, and the generalized linear model is abbreviated GLM, so we will use
those abbreviations. (However, some old textbooks from the 1970s may use GLM to
mean LM, because the generalized linear model had not been invented yet.)
GLM is similar to LM in that the predictor variables usually appear in the model as
the familiar linear combination:
c
c x
c x
c x
0
1
1
2
2
3
3
. . .
where the x’s are the predictor variables, and the c’s are the regression coefficients
(with c0 being called a constant term, or intercept).
But GLM extends the capabilities of LM in two important ways:»
» With LM, the outcome is assumed to be a continuous, normally distributed
variable. But with GLM, the outcome can be continuous or an integer. It can